fix: properly join url and path when calculating full url#692
fix: properly join url and path when calculating full url#692Lake Mossman (lmossman) merged 1 commit intomainfrom
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@lmossman/fix-url-path-joining#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch lmossman/fix-url-path-joiningHelpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
There was a problem hiding this comment.
Pull Request Overview
This PR fixes a bug in URL construction where full URLs from cursor pagination tokens were incorrectly concatenated with base URLs, causing request failures. The fix ensures proper URL joining by using the existing _join_url() method instead of naive string concatenation.
- Replaces string concatenation with
_join_url()method call in the_get_url()method - Adds comprehensive unit tests to validate the URL joining behavior with various scenarios
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| airbyte_cdk/sources/declarative/requesters/http_requester.py | Updates URL construction logic to use _join_url() method for proper URL handling |
| unit_tests/sources/declarative/requesters/test_http_requester.py | Adds test cases to validate URL joining behavior with different path scenarios |
📝 WalkthroughWalkthroughThe Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Test Case
participant HttpRequester as HttpRequester
participant HttpClient as HTTP Client
Test->>HttpRequester: Initialize with url and path
Test->>HttpRequester: Make request
HttpRequester->>HttpRequester: _get_url (uses _join_url)
HttpRequester->>HttpClient: Send request with joined URL
HttpClient-->>Test: Return prepared request (assert URL)
Estimated code review effort🎯 2 (Simple) | ⏱️ ~7 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Would you like to consider looping in anyone else for this review, or does this cover the main stakeholders? Wdyt? Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🧰 Additional context used🧠 Learnings (1)📓 Common learnings⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (2)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Christo Grabowski (ChristoGrab)
left a comment
There was a problem hiding this comment.
LGTM!
…)" This reverts commit 950acc6.
…)" This reverts commit 950acc6.
What
A user raised a bug with the current logic for handling
Request Pathcursor page token injection: https://airbyte1416.zendesk.com/agent/tickets/13510Basically, if the "next page token" contains a full URL which contains a different domain than the
urlfield of the stream's HttpRequester, we currently just naively concatenate the two together, which usually results in a failure.Here is an example from the zendesk ticket above:
https://globalus251.dayforcehcm.com/api/parachute/v1/Reports/job_postinghttps://globalus251.dayforcehcm.com:443/api/parachute/v1/Reports/job_posting?25c92c37-2ba7-468c-89f9-834f6c050ddc=2024-01-01T00%3a00%3a00.000000&cursor=UhsMQmz0rZtvj7oCJ1e7DLe2O7v5nb5Gco1YD9NCTtU%253D:443after the.com, but the API Endpoint URL doesn'thttps://globalus251.dayforcehcm.com/api/parachute/v1/Reports/job_postinghttps://globalus251.dayforcehcm.com:443/api/parachute/v1/Reports/job_posting?25c92c37-2ba7-468c-89f9-834f6c050ddc=2024-01-01T00%3A00%3A00.000000&cursor=UdaBrnaiXuKw%252BFyD6FzERZba7d%252FPA0HuIoON3QbfZwI%253D&pageSize=5(which is just the concatenation of the API Endpoint URL and the next page token)How
To fix this, I simply modify the logic to call
_join_url()on theurlandpathinstead of naively concatenating them together.This fixes the issue, because
_join_url()will prefer thepathif it contains its own full http scheme and domain, which is what we want in this case.This also has a side-benefit of correctly handling the case where the
urldoes not have a trailing/and thepathdoes not have a leading/- the old implementation would not insert a/between these, whereas the new implementation does.Testing
I have added unit tests to validate this fix, and you can reproduce the situation with this manifest: https://gist.github.com/lmossman/404b656c3e5726ddebc026eae118b7f8
Summary by CodeRabbit
Bug Fixes
Tests